Hierarchical phrase-based translation with weighted finite state transducers
نویسنده
چکیده
This dissertation is focused in the Statistical Machine Translation field (SMT), particularly in hierarchical phrase-based translation frameworks. We first study and redesign hierarchical models using several filtering techniques. Hierarchical search spaces are based on automatically extracted translation rules. As originally defined they are too big to handle directly without filtering. In this thesis we create more space-efficient models, aiming at faster decoding times without a cost in performance. We propose more refined strategies such as pattern filtering and shallow-n grammars. The aim is to reduce a priori the search space as much as possible without losing performance (or even improving it), so that search errors will be avoided. We also propose a new algorithm in the hierarchical phrase-based machine translation framework, called HiFST. For the first time, as far as we are aware, an SMT system combines successfully knowledge from two other research areas simultaneously: parsing, and weighted finite-state technology. In this way we are able to build a more efficient decoding tool, taking the advantages of both worlds: the capability of deep syntax reordering with parsing, and the compact representation and powerful semiring operations of weighted finitestate transducers. Combined with our findings for hierarchical grammars, we are able to build search-error free translation systems with state-of-the-art performance.
منابع مشابه
Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-<italic>n</italic> Grammars
In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search err...
متن کاملHierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars
In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search err...
متن کاملA phrase-level machine translation approach for disfluency detection using weighted finite state transducers
We propose a novel algorithm to detect disfluency in speech by reformulating the problem as phrase-level statistical machine translation using weighted finite state transducers. We approach the task as translation of noisy speech to clean speech. We simplify our translation framework such that it does not require fertility and alignment models. We tested our model on the Switchboard disfluency-...
متن کاملA Finite-State Approach to Phrase-Based Statistical Machine Translation
This paper presents a finite-state approach to phrase-based statistical machine translation where a log-linear modelling framework is implemented by means of an on-the-fly composition of weighted finite-state transducers. Moses, a well-known state-of-the-art system, is used as a machine translation reference in order to validate our results by comparison. Experiments on the TED corpus achieve a...
متن کاملACL 2008 THIRD WORKSHOP ON STATISTICAL MACHINE TRANSLATION http://www.statmt.org European Language Translation with Weighted Finite State Transducers: The CUED MT System for the 2008 ACL Workshop on SMT
We describe the Cambridge University Engineering Department phrase-based statistical machine translation system for SpanishEnglish and French-English translation in the ACL 2008 Third Workshop on Statistical Machine Translation Shared Task. The CUED system follows a generative model of translation and is implemented by composition of component models realised as Weighted Finite State Transducer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009